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Abstract 

We give a "regularity lemma" for degree-<i polynomial threshold functions (PTFs) over the Boolean 
cube { — 1, 1}". Roughly speaking, this result shows that every degree-d PTF can be decomposed into a 
constant number of subfunctions such that almost all of the subfunctions are close to being regular PTFs. 
Here a "regular" PTF is a PTF sign(p(x)) where the influence of each variable on the polynomial p(x) 
is a small fraction of the total influence of p. 

As an application of this regularity lemma, we prove that for any constants d > 1, e > 0, every 
degree-d PTF over n variables can be approximated to accuracy e by a constant-degree PTF that has 
integer weights of total magnitude 0(n d ). This weight bound is shown to be optimal up to logarithmic 
factors. 
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1 Introduction 



A polynomial threshold function (henceforth PTF) is a Boolean function / : { — 1, l} n — > { — 1, 1}, f(x) = 
sign(p(x)), where p : {—1, l} ra — )• M is a polynomial with real coefficients. If p has degree d, we say that 
/ is a degree-d PTF. Low-degree PTFs are a natural generalization of linear threshold functions (the case 
d = 1) and hence are of significant interest in complexity theory, see e.g. HABFR94ilBru9011OS03bllOS03al 
IDRST09I IGL941 IHKM09I IMZ091 ISak93l IShe091 and many other works. 

The influence of coordinate % on a function g : {— 1, l} ra — > M measures the extent to which Xi affects 
the output of g. More precisely, we have Infj(g) = Sssi ff(5') 2 > where 5Zsc[n] 9(S)xs(%) is the Fourier 
expansion of g. The fofa/ influence of g is the sum of all n coordinate influences, lnf(g) = Y^i=\ ^ n U(g)- 
See f O'D07allKS061 for background on influences. 

We say that a polynomial p : {— 1, l} n — > R is "r-regular" if the influence of each coordinate on p is at 
most a r fraction of p's total influence (see Section |2]for a more detailed definition). A PTF / is said to be 
r-regular if / = sign(p), where p is r-regular. Roughly speaking, regular polynomials and PTFs are useful 
because they inherit some nice properties of PTFs and polynomials over Gaussian (rather than Boolean) 
inputs; this intuition can be made precise using the "invariance principle" of Mossel et al. MMOO0511 . This 
point of view has been useful in the d = 1 case for constructing pseudorandom generators llDGJ + 09l . 
low- weight approximators HSer071lDS09l , and other results for LTFs HOS081|MORS09"1 . 

1.1 Our results 

A regularity lemma for degree-d PTFs. A number of useful results in different areas, loosely referred 
to as "regularity lemmas," show that for various types of combinatorial objects an arbitrary object can 
be approximately decomposed into a constant number of "pseudorandom" sub-objects. The best-known 
example of such a result is Szemeredi's classical regularity lemma for graphs [Sze78 ], which (roughly) says 
that any graph can be decomposed into a constant number of subsets such that almost every pair of subsets 
induces a "pseudorandom" bipartite graph. Another example is Green's recent regularity lemma for Boolean 
functions [Gre05]. Results of this sort are useful because different properties of interest are sometimes 
easier to establish for pseudorandom objects, and via regularity lemmas it may be possible to prove the 
corresponding theorems for general objects. We note also that results of this sort play an important part in 
the "structure versus randomness" paradigm that has been prominent in recent work in combinatorics and 
number theory, see e.g. BTao07ll . 

We prove a structural result about degree-d PTFs which follows the above pattern; we thus refer to it as 
a "regularity lemma for degree-d PTFs." Our result says that any low-degree PTF can be decomposed as a 
small depth decision tree, most of whose leaves are close to regular PTFs: 

Theorem 1. Let f(x) = sign(p(x)) be any degree-d PTF. Fix any r > 0. Then f is equivalent to a decision 
tree T, of depth 

depth(d,r) := - • (d\og-)° {d) 
r r' 

with variables at the internal nodes and a degree-d PTF /V, = sign(pp) at each leaf p, with the following 
property: with probability at least 1 — r, a random path]}] from the root reaches a leaf p such that f p is 
T-close to some r-regular degree-d PTF. 

Regularity is a natural way to capture the notion of pseudorandomness for PTFs, and results of interest can 
be easier to establish for regular PTFs than for arbitrary PTFs (this is the case for our main application, 
constructing low-weight approximators, as we describe below). Our regularity lemma provides a general 
tool to reduce questions about arbitrary PTFs to regular PTFs; it has already been used in this way as an 

'A random path corresponds to the standard uniform random walk on the tree. 
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essential ingredient in the recent proof that bounded independence fools all degree-2 PTFs [DKN09]. We 
note that the recent construction of pseudorandom generators for degree-d PTFs of [MZ09] also crucially 
uses a decomposition result which is very similar to our regularity lemma; we discuss the relation between 
our work and [MZ09] in more detail below and in Appendix ICl 

Application: Every low-degree PTF has a low-weight approximator. [Ser07] showed that every linear 
threshold function (LTF) over { — 1, 1}™ can be e-approximated by an LTF with integer weights w\, . . . , w n 
such that Yli wf = n ■ 2°( 1 / t2 \ (Here and throughout the paper we say that g is an e-approximator for / if 
/ and g differ on at most e2" inputs from {—1, l} n .) This result and the tools used in its proof found several 
subsequent applications in complexity theory and learning theory, see e.g. llDGJ + 09llOS08ll . 

We apply our regularity lemma for degree-d PTFs to prove an analogue of the [ Ser07 ] result for low- 
degree polynomial threshold functions. Our result implies that for any constants d, e, any degree-d PTF has 
an e-approximating PTF of constant degree and (integer) weight 0(n d ). 

When we refer to the weight of a PTF / = sign(p(x)), we assume that all the coefficients of p are 
integers; by "weight" we mean the sum of the squares of p's coefficients. We prove 

Theorem 2. Let f(x) = sign(p(ic)) be any degree-d PTF. Fix any e > 0. Then there is a polynomial q(x) 
of degree D = (d/e)°^ and weight 2^/^° d ■ n d such that sign(g(x)) is e-close to f. 

A result on the existence of low- weight e-approximators for PTFs is implicit in the recent work [DRST09]. 
They show that any degree-d PTF / has Fourier concentration 2^|S|>i/e°( d ) f(S) 2 ^ e > an d this easily 
implies that / can be e-approximated by a PTF with integer weights. (Indeed, recall that the learning 
algorithm of [LMN93] works by constructing such a PTF as its hypothesis.) The above Fourier concentration 
bound implies that there is a PTF of degree 1 / and weight n 1 ^ which e-approximates /. In contrast, 
our Theorem [2] can give a weaker degree bound (if d = but always gives a much stronger weight 

bound in terms of the dependence on n. We mention here that Podolskii HPod091 has shown that for every 
constant d > 2, there is a degree-d PTF for which any exact representation requires weight n n ( n 

We also prove lower bounds showing that weight ft(n d ) is required to e-approximate degree-d PTFs for 
sufficiently small constant e; see Section 1331 

Techniques. An important ingredient in our proof of Theorem [His a case analysis based on the "critical 
index" of a degree-d polynomial (see Section|2]for a formal definition). The critical index measures the first 
point (going through the variables from most to least influential) at which the influences "become small;" 
it is a natural generalization of the definition of the "critical index" of a linear form [Ser07] that has been 
useful in several subsequent works [Q S081 lDGJ + 09l IDS09I1 . Roughly speaking we show that 

• If the critical index of p is large, then a random restriction fixing few variables (the variables with 
largest influence in p) causes sign(p) to become a close-to-constant function with non-negligible 
probability; see Section [2TT1 (Note that a constant function is trivially a regular PTF.) 

• If the critical index of p is positive but small, then a random restriction as described above causes p to 
become regular with non-negligible probability; see Section [231 

• If the critical index of p is zero, then p is already a regular polynomial as desired. 

Related Work. The results of Sections l2TTl and l2T2l strengthen earlier results with a similar flavor in [DRST09 ]. 
Those earlier results had quantitative bounds that depended on n in various ways: getting rid of this depen- 
dence is essential for our low- weight approximator application and for the application in [DKN09]. 

Simultaneously and independently of this work, Ben-Eliezer et al. [BELY09], Harsha et al. [HKM09], 
and Meka and Zuckerman [MZ09 ] have proved similar structural results for PTFs. In particular, [HKM09 ] 
give a result which is very similar to Lemma [TT] the main component in our proof of Theorem Q] By 
applying the result from |HKM09], Meka and Zuckerman [MZ09] give a result which is quite similar to our 
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Theorem [TJ However, their definition of regularity is somewhat different from ours, and as a consequence 
their structural results and ours are quantitatively incomparable, as we discuss in Appendix Ben-Eliezer 
et al. give a result of similar flavor as our TheoremQ] They establish the existence of a decision tree such that 
most leaves are T-regular (as opposed to r-close to being T-regular). The depth of their tree is exponential 
in 1/t, which makes it quantitatively weaker for applications. 

1.2 Preliminaries 

We start by establishing some basic notation. We write [n] to denote {1,2, ... ,n} and [k,£] to denote 
{k, k + 1, . . . , £}. We write E[X] and Var[X] to denote expectation and variance of a random variable X, 
where the underlying distribution will be clear from the context. For x G {—1, l} n and A C [n] we write 
xa to denote (xj)j G A- 

We assume familiarity with the basic elements of Fourier analysis over {—1, l} n ; a concise review of 
the necessary background is given in Appendix lA.il Let p : {—1, l} ra — > R and p{x) = J2sP(S)xs(x) be 
its Fourier expansion. The influence of variable i on p is Infj(p) = f ^2sBiP(S) 2 , and the total influence of 
p is Inf (p) = Y^i=i ^ n U(p)- For a function / : {—1, l} n — > R and q > 1, we denote by \\f\\ q its l q norm, 
i.e. = E-jJIp^)! 9 ] 1 / 9 , where the intended distribution over x will always be uniform over {—1, l} n . 

For Boolean functions f,g : {—1, l} n — > {—1,1} the distance between / and g, denoted dist(/, g), is 
the Pr x [/(x) / 9( x )] where the probability is over uniform x G {—1, l} ra - 

Our proofs will use various bounds from probability theory, which we collect for easy reference in 
Appendices IA. 21 and IA. 31 We call the reader's attention in particular to Theorem [7J throughout the paper, C 
(which will be seen to play an important role in our proofs) denotes C5 2 , where Co is the universal constant 
from that theorem. 

2 Main Result: a regularity lemma for low-degree PTFs 

Let / : {— l,l} n — > { — 1,1} be a degree-(f PTF Fix a representation f(x) = sign(p(x)), where p : 
{ — 1, l} n — > R is a degree-d polynomial which (w.l.o.g.) we may take to have Var[p] = 1. We assume 
w.l.o.g. that the variables are ordered in such a way that Infj(p) > Irrfj + i(p) for all i G [n — 1]. 

We now define the notion of the T-critical index of a polynomial [D RST091 and state its basic properties. 

Definition 1. Let p : {—1, l} n — > R and r > 0. Assume the variables are ordered such that Infj(p) > 
Inf j+i(p) for all j G [n — 1]. The r-critical index ofp is the least i such that: 

n 

Inf m (p)<r- E Inf j(p). (1) 

j=i+i 

7f(|7|) does not hold for any i we say that the T-critical index ofp is +oo. If p has T-critical index 0, we 
say that p is T-regular. 

Note that if p is a T-regular polynomial then maxj Infj(p) < dr since the total influence of p is at most 
d. If f(x) = sign(p(x)), we say / is T-regular when p is T-regular, and we take the T-critical index of / 
to be that of p. □ The following lemma (see Appendix |B] for the easy proof) says that the total influence 
Y17=j+i ^ n ^i (p) § oes down geometrically as a function of j prior to the critical index: 

Lemma 1. Let p : {—1, l} n —> R and t > 0. Let k be the T-critical index ofp. For j G [0, k] we have 

£ &rfi(p) < (l-r) J -Inf(p). 

i=j+i 

2 Strictly speaking, r-regularity is a property of a particular representation and not of a PTF /, which could have many different 
representations. The particular representation we are concerned with will always be clear from context. 
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We will use the fact that in expectation, the influence of an unrestricted variable in a polynomial does 
not change under random restrictions (again see Appendix |B] for the easy proof): 



Lemma 2. Let p : { — 1, 1}" —> R. Consider a random assignment p to the variables x\, . . . , X/. and fix 
£e[k + l,n]. Then B p [lnf e (p p )} = lnf £ (p). 

Notation: For 5C [n], we write "p fixes S" to indicate that p G {—1, l}' 5 ' is a restriction mapping x$, i.e. 
each coordinate in S, to either —1 or 1 and leaving coordinates not in S unrestricted. 

2.1 The large critical index case 

The main result of this section is Lemma[3l which says that if the critical index of / is large, then a noticeable 
fraction of restrictions p of the high-influence variables cause f p to become close to a constant function. 



Lemma 3. Let f : {—1, l} n — > {—1, 1} be a degree-d PTF f = sign(p). Fix (3 > and suppose that f has 

T-critical index at least K = a/r, where a = 0(dlog log(l//3) + dlog d). Then, for at least a l/(2C d ) 
fraction of restrictions p fixing [K], the function f p is /3-close to a constant function. 



def 

Proof. Partition the coordinates into a "head" part H = [K] (the high-influence coordinates) and a "tail" 
part T = [n} \ H. We can write p{x) = p(xh,xt) = p'{xh) + q{xh, xt), where p'(xh) is the truncation 
of p comprising only the monomials all of whose variables are in H, i.e. p'(xn) = Y1schP(^)xs{xh)- 

Now consider a restriction p of H and the corresponding polynomial p p (xt) = p{p, xt). It is clear that 
the constant term of this polynomial is exactly p'(p). To prove the lemma, we will show that for at least 
a l/(2C d ) fraction of all p £ {—1, 1} K ', the (restricted) degree-d PTF f p {xx) = sign(p p (xt)) satisfies 
^ >r x T [fp( x T) 7^ sign(p'(/9))] < /3. Let us define the notion of a good restriction: 

Definition 2. A restriction p € {—1,1} that fixes H is called good iff the following two conditions are 
simultaneously satisfied: (i) \p'(p)\ > t* = l/(2C d ), and (ii) \\q(p, xt)\\2 < t* ■ (0(log(l//3)) ^ 2 . 

Intuitively condition (i) says that the constant term p'(p) of p p has "large" magnitude, while condition (ii) 
says that the polynomial q(p, xt) has "small" /2-norm. We claim that if p is a good restriction then the 
degree-ci PTF f p satisfies Pr XT [/ p (xr) ^ sign(p' (p))} < (3. To see this claim, note that for any fixed p we 
have f p {xx) 7^ sign{p'(p)) only if \q(p, xt)\ > \p'(p)\, so to show this claim it suffices to show that if p is 
a good restriction then Pr XT [\q(p, xt)\ > \p'(p)\] < But for p a good restriction, by conditions (i) and 
(ii) we have 

Pr XT [\q(p,x T )\ > \p'(p)\] < Pr XT [\q(p,x T )\ > \\q(j>,x T )h " (e(log(l//3)) d/2 ~ 

which is at most f3 by the concentration bound (Theorem ©, as desired. So the claim holds: if p is a good 
restriction then J p (xt) is /3-close to sign(p'(p)). Thus to prove Lemma|3]it remains to show that at least a 
l/(2C d ) fraction of all restrictions p to H are good. 

We prove this in two steps. First we show (Lemma @]) that the polynomial p' is not too concentrated: 
with probability at least 1/C d over p, condition (i) of Definition [2] is satisfied. We then show (Lemma [5]) 
that the polynomial q is highly concentrated: the probability (over p) that condition (ii) is not satisfied is at 
most l/(2C d ). Lemma|3]then follows by a union bound. 

Lemma 4. We have that Pr p [\p' (p)\ >t*] > 1/C d . 

Proof. Using the fact that the critical index of p is large, we will show that the polynomial p' has large 
variance (close to 1), and hence we can apply the anti-concentration bound Theorem[7] 
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We start by establishing that Var[p'] lies in the range [1/2, 1]. To see this, first recall that for g : 
{-l,l} n -)• 1 we have Var[g] = Eo^sqn] S 2 ^)- It is thus clear that Var[p'] < Var[p] = 1. To 
establish the lower bound we use the property that the "tail" T has "very small" influence in p, which is a 
consequence of the critical index of p being large. More precisely, Lemma Q] yields 

E hdi(p) < (1 - r) K • Inf(p) = (1 - r) a / r • Inf(p) <d-e~ a (2) 
where the last inequality uses the fact that Inf(p) < d. Therefore, we have: 

Var[p'] = Var[p] - E p{S) 2 > 1 - E Infj(p) > 1 - de~ a > 1/2 
Tns^0,sc[n] ieT 

where the first inequality uses the fact that Infj(p) = Eiesc[n] P(S) 2 ' th e second follows from Q and the 
third from our choice of a. We have thus established that indeed Var[p'] € [1/2, 1]. 

At this point, we would like to apply Theorem|7]for p'. Note however that E[p'] = E[p] = p(0) which 
is not necessarily zero. To address this minor technical point we apply Theorem [7] twice: once for the 
polynomial p" = p' - p(0) and once for -p" . (Clearly, E[p"] = and Var[p"] = Var[p'] G [1/2, 1].) We 
thus get that, independent of the value of p(0), we have Pr p [|j/(p)| > 2 -1 / 2 • C'~ d ] > C~ d , as desired. □ 

Lemma 5. We have that Pr p [\\q(p,x T )h > t* • (G(log(l//3))~ d/2 ] < l/{2C d ). 

Proof. To obtain the desired concentration bound we must show that the degree-2d polynomial Q(p) = 
\\q(p, xt) Hi has "small" variance. The desired bound then follows by an application of Theorem [6l 

We thus begin by showing that ||Q||2 < 3 d de~ a . To see this, we first note that Q(p) = Y^^sct Pp(S) 2 '■ 
Hence an application of the triangle inequality for norms and hypercontractivity (Theorem [5]) yields: 

\\Qh< E \\f P (s)\\l < s d E \\f P (s)\\i 

0^SCT 0^SCT 

We now proceed to bound from above the RHS term by term: 



E \\Pp(S)\\1 = E E P \p P (S) 2 ]=E p \ E Pp(S) 2 ] <E p [lnf(p p )] =EiElnfife 



D^SCT 



ieT 



J P> 



= E E p [lnfi(p p )] = E I"fi(p) < (3) 

ieT ieT 

where the first inequality uses the fact Inf (p p ) > YI^sct Pp{S) 2 , the equality in ([3]> follows from Lemma[2l 
and the last inequality is Equation (O. We have thus shown that ||Q||2 < 3 d de~ a . 

We now upper bound Pr p [Q(p) > (t*) 2 ■ 9(log(l//3))- d ]. Since ||Q|| 2 < 3 d de~ a , Theorem implies 
that for all t > e d we have Pr p [Q(p) > t ■ 3 d de' a ] < exp(-fi(t 1 / d )). Taking t to be e(^ln d C) this 
upper bound is at most l/(2C d ). Our choice of the parameter a gives t ■ d3 d ■ e~ a < (t*) 2 ■ 0(log(l//3)) _d . 
This completes the proof of Lemma |5J and thus also the proof of Lemma [3] □ 



2.2 The small critical index case 

In this section we show that if the critical index of p is "small", then a random restriction of "few" variables 
causes p to become regular with non-negligible probability. We do this by showing that no matter what the 
critical index is, a random restriction of all variables up to the r-critical index causes p to become r'-regular, 
for some r' not too much larger than r, with probability at least 1 / (2C d ). More formally, we prove: 

Lemma 6. Let p : {—1, 1}™ — > K be a degree-d polynomial with r-critical index k € [n]. Let pbe a random 
restriction that fixes [k], and let r' = (C ■ dlnd ■ In ^) d ■ r for some suitably large absolute constant C. 
With probability at least l/(2C d ) over the choice of p, the restricted polynomial p p is r'-regular. 
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Proof. We must show that with probability at least 1 / (2C d ) over p the restricted polynomial p p satisfies 

n 

luhipp)/ Z Infj(P P )<T' (4) 
j=k+\ 

for all £ G [k + l,n]. Note that before the restriction, we have Inf^(p) < r • £™ =fc+1 Infj(p) for all 
£ G [A; + 1, n] because the r-critical index of p is k. 

Let us give an intuitive explanation of the proof. We first show (Lemma U} that with probability at least 
C~ d the denominator in © does not decrease under a random restriction. This is an anti-concentration 
statement that follows easily from Theorem [7] We then show (Lemma [8]> that with probability at least 
1 — C~ d /2 the numerator in (@]) does not increase by much under a random restriction, i.e. no variable 
influence Inf i(p p ), £ G [k + l,n], becomes too large. Thus both events occur (and p p is r'-regular) with 
probability at least C~ d /2. 

We note that while each individual influence lrd^{p p ) is indeed concentrated around its expectation (see 
Claim©, we need a concentration statement for n — k such influences. This might seem difficult to achieve 
since we require bounds that are independent of n. We get around this difficulty by a "bucketing" argument 
that exploits the fact (at many different scales) that all but a few influences Inf^(p) must be "very small." 

It remains to state and prove Lemmas|7]and[U Consider the event £ = f {p £ {—1, l} k \ Y?e=k+i^ n ^(Pp) — 
Y%=k+i I n ^(p)}- We fi rst show: 
Lemma 7. Pr p [£] > C d . 

Proof. It follows from Fact [14] and the Fourier expression of Inf^(p p ) that A(p) = f YH=k+i ^- n ^iipp) i s a 
degree-2ci polynomial. By Lemma[2]we get that E p [yl] = Y^l=k+i ^ n ^(p) > 0- Also observe that > 
for all p £ {-l,l} fc . We may now apply Theorem|7]to the polynomial A' = A — E p [^4], to obtain: 

Pr p [£] = Pr[A' > 0] > Pr[A' > C 2d ■ a(A')} > C^ 2d = C~ d . □ 
We now turn to Lemma [8] Consider the event J = {/) £ { — 1,1}* | max^ g [ fc+1 n ] Inf^(p p ) > 

T ' Ej= fc+ i Inf j(p)}- We show: 

Lemma 8. Pr p [J] < (1/2) • C~ d . 

The rest of this subsection consists of the proof of Lemma [8] A useful intermediate claim is that the 
influences of individual variables do not increase by a lot under a random restriction (note that this claim, 
proved in Appendix IB1 does not depend on the value of the critical index): 

Claim 9. Let p : { — 1, l} n — > M be a degree-d polynomial. Let p be a random restriction fixing [j]. 
Fix any t > e 2d and any £ £ [j + l,n]. With probability at least 1 — exp(— Q^ 1 /^)) over p, we have 
Inf£(;p p ) < 2> d tlnU{p). ' 

Claim [9] says that for any given coordinate, the probability that its influence after a random restriction 
increases by a t factor decreases exponentially in t. Note that Claim [9] and a naive union bound over all 
coordinates in [k + 1, n] does not suffice to prove Lemma [8] Instead, we proceed as follows: We partition 
the coordinates in [k + 1 , n] into "buckets" according to their influence in the tail of p. In particular, the i-th 
bucket (i > 0) contains all variables £ € [k + 1, n] such that 

n 

Inf<(p)/ £ Inf»e[r/2* +1 ,T/n 
j=k+l 

We analyze the effect of a random restriction p on the variables of each bucket i separately and then conclude 
by a union bound over all the buckets. 
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So fix a bucket i. Note that, by definition, the number of variables in the i-th bucket is at most 2 2,+1 /r. 
We bound from above the probability of the event B(i) that there exists a variable i in bucket i that violates 
the regularity constraint, i.e. such that Inf^(p p ) > r' YH=k+i ^ n ^i(p)- We will do this by a combination of 
Claim [9] and a union bound over the variables in the bucket. We will show: 

Claim 10. We have that Pr p [B(i)] < 2~( i+2 ) • C~ d . 

The above claim completes the proof of Lemma [8] by a union bound across buckets. Indeed, as- 
suming the claim, the probability that any variable i G [k + 1, n] violates the condition Inf^(p p ) < 
r' Etk+l **t(p) ^ at most Pr p [B(i)} < C~ d 2^ ££,(1/2)' = (1/2) ■ C~ d . 

It thus remains to prove Claim [TOj Fix a variable £ in the i-th bucket. We apply Claim [9] selecting a 
value of t = t = (In c d ^+ 2 n j s clear that t < c' d (d + i + In ^) d for some absolute constant c'. As a 
consequence, there is an absolute constant C such that for every i, 

t<3- d C' d 2 i (d\ndln-) d . (5) 
r 

(To see this, note that for i < Wd In d we have d+ i + In - < lid In d In -, from which the claimed bound is 
easily seen to hold. For i > Wd In d, we use d+i+\n - < di In - and the fact that i d < 2 l for i > Wd In d.) 
Inequality ([5]> can be rewritten as 3 d ■ t ■ £ < t'. Hence, our assumption on the range of Infg(p) gives 

3 d -r-Inf £ (p)<T'- £ MjCp). 

j=k+i 

Therefore, by Claim|9l the probability that coordinate £ violates the condition Inf^(p p ) < r 1 ^j=fc+i Inf j (p) 
is at most r/((7 d 4 i+2 ) by our choice of t. Since bucket z contains at most 2 l+1 /r coordinates, Claim [TOl 
follows by a union bound. Hence Lemma[8l and thus Lemma[6l is proved. □ 

2.3 Putting Everything Together: Proof of Theorem [1] 

The following lemma combines the results of the previous two subsections (see Appendix |B] for a proof): 

Lemma 11. Letp : {—1, l} n — > M be a degree-d polynomial and < t,/3 < 1/2. Fixa = 0(dloglog(l//3)+ 
dlogd) and r ' = t • (C"dln<iln(l/T)) rf , where C is a universal constant. (We assume w.l.o.g. that the 
variables are ordered s.t. Infj(p) > Infj + i(p), i € [n — 1].) One of the following statements holds true: 

1. The polynomial p is r-regular. 

2. With probability at least l/(2C d ) over a random restriction p fixing the first a/r (most influential) 
variables ofp, the function sign(p p ) is /3-close to a constant function. 

3. There exists a value k < ol/t, such that with probability at least l/(2C d ) over a random restriction 
p fixing the first k (most influential) variables ofp, the polynomial p p is t' -regular. 

Proof of Theorem^ We begin by observing that any function / on {—1, 1}™ is equivalent to a decision 
tree where each internal node of the tree is labeled by a variable, every root-to-leaf path corresponds to a 
restriction p that fixes the variables as they are set on the path, and every leaf is labeled with the restricted 
subfunction f p . Given an arbitrary degree-ci PTF / = sign(p), we will construct a decision tree T of the 
form described in TheoremQ] It is clear that in any such tree every leaf function f p will be a degree-d PTF; 
we must show that T has depth depth(d, r) and that with probability 1 — r over the choice of a random 
root-to-leaf path p, the restricted subfunction f p = sign(p p ) is r-close to a r-regular degree-d PTF. 

For a tree T computing / = sign(p), we denote by N(T) its set of internal nodes and by L(T) its set of 
leaves. We call a leaf p G L(T) "good" if the corresponding function f p is r-close to being r-regular. We 
call a leaf "bad" otherwise. Let GL(T) and BL(T) be the sets of "good" and "bad" leaves in T respectively. 
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The basic approach for the proof is to invoke LemrnafTTIrepeatedly in a sequence of atmost2C d ln(l/r) 
stages. In the first stage we apply Lemma [TT] to / itself; this gives us an initial decision tree. In the 
second stage we apply Lemma [TT] to those restricted subf unctions f p (corresponding to leaves of the initial 
decision tree) that are still r-far from being r-regular; this "grows" our initial decision tree. Subsequent 
stages continue similarly; we will argue that after at most 2C d ln(l/r) stages, the resulting tree satisfies the 
required properties for T . In every application of Lemma ITTI the parameters ft and f are both taken to be r; 
note that taking f to be r sets the value of r in Lemma ITTI to a value that is less than r. 

We now provide the details. In the first stage, the initial application of Lemma ITTI results in a tree T\. 
This tree T\ may consist of a single leaf node that is r-regular (if / is r-regular to begin with - in this case, 
since r < r, we are done), or a complete decision tree of depth a/f (if / had large critical index), or a 
complete decision tree of depth k < a/f (if / had small critical index). Note that in each case the depth of 
T\ is at most a/f. Lemma ITTI guarantees that: 

Pr peTl [p e BLiTx)} < 1 - l/(2C d ), 

where the probability is over a random root-to-leaf path p in T\ . 

In the second stage, the "good" leaves p £ GL(T\) are left untouched; they will be leaves in the final 
tree T ■ For each "bad" leaf p G BL(T\), we order the unrestricted variables in decreasing order of their 
influence in the polynomial p p , and we apply LemmaQTjto fp- This "grows" T\ at each bad leaf by replacing 
each such leaf with a new decision tree; we call the resulting overall decision tree T 2 . 

A key observation is that the probability that a random path from the root reaches a "bad" leaf is signif- 
icantly smaller in T 2 than in T\ ; in particular 

Pr peT2 [pGBL(T 2 )]<(l-l/(2C d )) 2 . 

We argue this as follows: Let p be any fixed "bad" leaf in T\, i.e. p € BL{T\). The function f p is not 
r'-regular and consequently not r-regular. Thus, either statement (2) or (3) of Lemma QTJ must hold when 
the Lemma is applied to f p . The tree that replaces p in To has depth at most a/f, and a random root-to-leaf 
path p\ in this tree reaches a "bad" leaf with probability at most 1 — \/(2C d ). So the overall probability that 
a random root-to-leaf path in T 2 reaches a "bad" leaf is at most (1 — 1/ (2C d )) 2 . 

Continuing in this fashion, in the i-th stage we replace all the bad leaves of Tj_x by decision trees 
according to Lemma [TT] and we obtain the tree T,. An inductive argument gives that 

Pr peTi [p £ BL(Ti)\ < (1 - l/{2C d ))\ which is at most r for i* = 2C d ln(l/r). 

The depth of the overall tree will be the maximum number of stages (2C d ln(l/r)) times the maximum 
depth added in each stage (at most a/f, since we always restrict at most this many variables), which is at 
most (a/f) • i*. Since ft = r, we get a = 6((iloglog(l/r) + dlogd). Recalling that f in Lemma ITTI is 
set to r, we see that r = r/ (C'd In <im(l/r)) ^. By substitution we get that the depth of the tree is upper 
bounded by d°W • (1/r) • log(l/r)°^ which concludes the proof of TheoremQ] □ 

3 Every degree-d PTF has a low-weight approximator 

In this section we apply Theorem [TJ to prove Theorem [2] which we restate below: 

Theorem|2]Lef f(x) = sign(p(x)) be any degree-d PTF. Fix any e > and let r = (0(1) • e/d) 8d . Then 
there is a polynomial q(x) of degree D = d + depth((i, r) and weight n d ■ 2 4de P th ( rf > T ) • (d/e)°^ d \ which is 
such that the PTF s\gn(q(x)) is e-close to f. 

To prove Theorem|2] we first show that any sufficiently regular degree-ci PTF over n variables has a low- 
weight approximator, of weight roughly n d . Theorem [TJ asserts that almost every leaf p of T is close to a 
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regular PTF; at each such leaf p we use the low-weight approximator of the previous sentence to approximate 
the regular PTF, and thus to approximate f p . Finally, we combine all of these low-weight polynomials to 
get an overall PTF of low weight which is a good approximator for /. We give details below. 

3.1 Low- weight approximators for regular PTFs 

In this subsection we prove that every sufficiently regular PTF has a low-weight approximator of degree d: 

Lemma 12. Given e > 0, let r = (©(1) • e/d) 8d . Let p : {—1, l} n -> ffi^ea r-regular degree-d polynomial 
with Var[p] = 1. There exists a degree-d polynomial q : {—1, l} n — > R of weight n d ■ (d/e)°^ such that 
sign(g(x)) is an e-approximator for sign(p(x)). 

Proof. The polynomial q is obtained by rounding the weights of p to an appropriate granularity, similar to 
the regular case in HSer071 for the d = 1 case. To show that this works, we use the fact that regular PTFs 
have very good anti-concentration. In particular we will use the following claim, which is proved using the 
invariance principle [MOO05 ] and Gaussian anti-concentration [CW01 ] (see Appendix iDl for the proof): 

Claim 13. Let p : { — 1, l} n — > R be a r-regular degree-d polynomial with Var[p] = 1. Then ~Pr x [\p(x)\ < 
r] < 0{dr x ' M ). 

We turn to the detailed proof of Lemma [12] We first note that if the constant coefficient p(0) of P has 
magnitude greater than (0(log(l/e))) d / 2 , then Theorem [6] (applied to p(x) — p(0)) implies that sign(p(x)) 
agrees with sign(p(0)) for at least a 1 — e fraction of inputs x. So in this case sign(p(x)) is e-close to a 
constant function, and the conclusion of the Lemma certainly holds. Thus we henceforth assume that |p(0)| 
is at most (0(log(l/e))) d / 2 . 

Let 

a = T/{Kn-hi{A/e)) d/2 

where K > is an absolute constant (specified later). For each 5 ^ let q(S) be the value obtained 
by rounding p{S) to the nearest integer multiple of a, and let g(0) equal p(0). This defines a degree-d 
polynomial q{x) = Y^s Q(S)xs{x). It is easy to see that rescaling by a, all of the non-constant coefficients 
of q(x) I a are integers. Since each coefficient q(S) has magnitude at most twice that of p(S), we may bound 
the sum of squares of coefficients of q(x)/a by 

W? + Xs*fS)* < (Q(iog(iA))/ < . OM 

a 2 a 2 a 2 

We now observe that the constant coefficient p(0) of q(x) can be rounded to an integer multiple of a without 
changing the value of sign.(q(x)) for any input x. Doing this, we obtain a polynomial q'(x) / a with all integer 
coefficients, weight n d ■ (d/e) 0( - d \ and which has sign(g'(x)) = sign(g(x)) for all x. 

In the rest of our analysis we shall consider the polynomial q(x) (recall that the constant coefficient of 
q{x) is precisely p(0)). It remains to show that sign(g) is an e-approximator for sign(p). For each S ^ 
let e(S) equal p(S) — q(S). This defines a polynomial (with constant term 0) e(x) = Yls^(S)xs, and we 
have q(x) + e(x) = p(x). (The coefficients e(S) are the "errors" induced by approximating p(S) by q(S).) 

Recall that r = (9(1) • e/d) sd . For any input x, we have that sign(g(x)) ^ sign(p(x)) only if either (i) 
|e(x)| > r, or (ii) \p(x)\ < r. Since each coefficient of e(x) satisfies |e(5)| < a/2 < 2 ^ Rn » tne 

sum of squares of all (at most n d ) coefficients of e is at most 

T 2 T 

yZe(S) 2 < ttt— TTTTT^ i and thus llell < — ; — , 

s ~ 4(Kln(4/e)) d ' 11 11 ~ 2(K ln(4/e)) d / 2 
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Applying Theorem[6l we get that Pr x [|e(x)| > r] < e/2 (for a suitable absolute constant choice of K), so 
we have upper bounded the probability of (i). 

For (ii), we use the anti-concentration bound for regular polynomials, Claim [13] This directly gives us 
thatPr x [|p(x)| < r] < 0{dr l / m ) < e/2. 

Thus the probability, over a random x, that either (1) or (2) holds is at most e. Consequently sign(g) is 
an e-approximator for sign(p), and Lemma [121 is proved. □ 

3.2 Proof of Theorem H 

Let / = sign(p) be an arbitrary degree-<i PTF over n Boolean variables, and let e > be the desired 
approximation parameter. We invoke Theorem Q] with its "r" parameter set to r = (0(1) • (e/2)/d) sd (i.e. 
our choice of r is obtained by plugging in "e/2" for e in the first sentence of Lemma PT2l. For each leaf p of 
the tree T as described in Theorem[T](we call these "good" leaves), let gW 1 ) be a r-regular degree-c? PTF that 
is r-close to f p . By Lemma PT2l for each such leaf p there is a degree-d polynomial of weig ht n d -{d/e)°( d \ 
which we denote q( p \ such that g0>) is e/2-close to sign(g^). For each of the other leaves in T (which are 
reached by at most a r fraction of all inputs to T - we call these "bad" leaves), for which f p is not r-close 
to any r-regular degree-d PTF, let be the constant- 1 function. 

For each leaf p of depth r in T, let P p {x) be the unique multilinear polynomial of degree r which 
outputs 2 r iff x reaches p and outputs otherwise. (As an example, if p is a leaf which is reached by the 
path "£3 = —1, xq = 1, x% = 1" from the root in T, then P p {x) would be (1 — x%){1 + xq)(1 + £2)-) Our 
final PTF is 

g(x) = sign(Q(x)), where Q(x) = ^ P p (x)q^ p \x). 

p 

It is easy to see that on any input x, the value Q(x) equals 2^ px ^ ■ q( px \x), where we write p x to denote 
the leaf of T that x reaches and \p x \ to denote the depth of that leaf. Thus sign(Q(x)) equals sign(q( px \x)) 
for each x, and from this it follows that Pr x [g(x) 7^ f(x)] is at most r + r + e/2 < e. Here the first r is 
because a random input x may reach a bad leaf with probability up to r; the second r is because for each 
good leaf p, the function g^ is r-close to f p ; and the e/2 is because sign(g( p )) is e/2-close to g( p \ 

Since T has depth depth(d, r), it is easy to see that Q has degree at most depth(d, r) + d. It is clear that 
the coefficients of Q are all integers, so it remains only to bound the sum of squares of these coefficients. 
Each polynomial addend P p (x)q^ (x) in the sum is easily seen to have sum of squared coefficients 

EP^KS) 2 = E[(P p • q (p) ) 2 } < (maxP p (x) 2 ) ■ B[q^{x) 2 ] < 2 2dc P th ^) . n d ■ (d/e)° (d) . (6) 



s 

Since T has depth depth(cZ, r), the number of leaves p is at most 2 depth ( rf ' r ), and hence for each S by 
Cauchy-Schwarz we have 

Q(S) 2 = (ePp^KS)^ < 2 dc P th ^) • £P^)(S) 2 . (7) 

This implies that the total weight of Q is 

EQ(S) 2 < 2 de P th ( d < T ) • £P^)(S) 2 (using©) 

S p,S 

< 2 2depth W T ) max Pp^P> (S) 2 

< 2 4depth ^ T ) -n d - (d/e) 0( - d \ (using©) 

and Theorem [2] is proved. □ 
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3.3 Degree-d PTFs require Q(n d ) -weight approximators 



In this section we give two lower bounds on the weight required to e-approximate certain degree-<i PTFs. 
(We use the notation O^O below to indicate that the hidden constant of the big-Omega depends on d.) 

Theorem 3. For all sufficiently large n, there is a degree-d n-variable PTF f(x) with the following property: 
Let K(d) be any positive-valued function depending only on d. Suppose that g(x) = sign(g(a;)) is a degree- 

K(d) PTF with integer coefficients q(S) such that dist(/, g) < e* where e* '= C~ d /2. Then the weight of q 
is Q.d{n d I log n). 

Theorem 4. For all sufficiently large n, there is a degree-d n-variable PTF f(x) with the following prop- 
erty: Suppose that g(x) = sign(g(x)) is any PTF (of any degree) with integer coefficients q(S) such that 

dist(/, g) < e* where e* = C~ d /2. Then the weight of q is Vt d (n d - X ). 

Viewing d and e as constants, Theorem [3] implies that the 0(n d ) weight bound of our e-approximator 
from Theorem [2] (which has constant degree) is essentially optimal for any constant-degree e-approximator. 
Theorem|4]says that there is only small room for improving our weight bound even if arbitrary-degree PTFs 
are allowed as approximators. We prove these results in Appendix ID. 21 
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A Useful Background Results 

A.l Fourier Analysis over { — 1 , 1 } n . 

We consider functions / : {— 1, l} n — > M (though we often focus on Boolean-valued functions which map 
to { — 1, 1}), and we think of the inputs x to / as being distributed according to the uniform probability 
distribution. The set of such functions forms a 2 n -dimensional inner product space with inner product given 
by (/,#) = E[/(x)g(x)]. The set of functions (xs)sc[n] defined by xs(x) = H ieS x i forms a com pl ete 
orthonormal basis for this space. Given a function / : {—1, l} n 4lwe define its Fourier coefficients by 

f(S) = E[/(x)xs(x)], an d we have that f(x) = f{S)xs{x)- We refer to the maximum \S\ over all 
nonzero f(S) as the Fourier degree of /. 

As an easy consequence of orthonormality we have Plancherel's identity (/, g) = Y^s f(^)d{S), which 
has as a special case Parseval's identity, \E[f(x) 2 ] = ]52 s f(S) 2 . From this it follows that for every / : 
{-1, l} n -»■ {-1, 1} we have £ s f{S) 2 = 1. We recall the well-known fact (see e.g. HKKL88IO that the 
total influence Inf (/) of any Boolean function equals J2 S f(S) 2 \S\. Note that, in this setting, the expectation 
and the variance can be expressed in terms of the Fourier coefficients of / by E[/] = /(0) and Var[/] = 

A.2 Useful Probability Bounds for Section |2] 

We first recall the following moment bound for low-degree polynomials, which is equivalent to the well- 
known hypercontractive inequality of BBon701 IGro751 : 

Theorem 5. Let p : {—1, l} n — > M. be a degree-d polynomial and q > 2. Then 

\\ P \\ q <(q-l) d / 2 \\ph. 

The following concentration bound for low-degree polynomials, a simple corollary of hypercontractiv- 
ity, is well known (see e.g. HO ' D07bl IDFKO061 IAH091 ) : 

Theorem 6. Let p : {—1, l} n ~S.be a degree-d polynomial. For any t > e d , we have 

Pr x [\p(x)\ > t\\p\\ 2 ] < e W (-n(t 2 / d )). 

We will also need the following weak anti-concentration bound for low-degree polynomials over the 
cube: 
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Theorem 7 ([DFKO06, AH09]). There is a universal constant Co > 1 such that for any non-zero degree-d 
polynomial p : {— 1, l} n — > R w/?/z E[p] = 0, we /zave 

P r:l .[p(x)>C - d -||p|| 2 ]>C - d . 

Throughout this paper, we let C = Cq, where Co is the universal constant from Theorem [7] Note that since 
C > Co, Theorem |7]holds for C as well. 

A. 3 Useful Probability Bounds for Section |3] 

We denote by Af n the standard n-dimensional Gaussian distribution M(0, l) n . 

The following two facts will be useful in the proof of Theorem |2j in particular in the analysis of the reg- 
ular case. The first fact is a powerful anti-concentration bound for low-degree polynomials over independent 
Gaussian random variables: 

Theorem 8 ( BCW011 ). Let p : R n ->Kfefl nonzero degree-d polynomial. For all e > we have 

Prg^4\p(g)\<e\\p\\ 2 ]<0(de 1 / d ). 

We note that the above bound is essentially tight, even for multi-linear polynomials. 

The second fact is a version of the invariance principle of Mossel, O'Donnell and Oleszkiewicz, specif- 
ically Theorem 3.19 under hypothesis H4 in MMOO0511 : 

Theorem 9 (TM OO05I '). Let p(x) = J2sc[n] \s\<dP(S)xs(x) be a degree-d multilinear polynomial with 
Var[p] = 1. Suppose each coordinate i E [n] has Infj(p) < r. Then, 

sup \Pr x \p(x) <t}- Vv g ^ Nn \p{G) <t]\< 0{dT X l^). 

B Omitted Proofs from Section |2] 

B. 1 Proof of Lemma Q] 

Recall Lemma [H 

Lemma[TJ Let p : { — 1, l} n — > R and r > 0. Let k be the r-critical index of p. For j E [0, k] we have 

n 

E Inf*(p) < (l-rp'-Inffr). 

i=j+l 

Proof. The lemma trivially holds for j = 0. In general, since j is at most k, we have that Infj(p) > r • 
E" =J - Infj(p), or equivalently ^™ =J+1 Infj(p) < (1— r)-^" =J Infj(p) which yields the claimed bound. □ 

B.2 Proof of Lemma H 

To prove Lemma 12 we first recall an observation about the expected value of Fourier coefficients under 
random restrictions (see e.g. [LMN93 ]): 

Fact 14. Let p : { — 1, l} n —> R. Consider a random assignment p to the variables x\, . . . , x^. Fix any 
S C + n]. Then we have p p {S) = J2TC\k]P(^ '^T)px and therefore 'E p [p p (S) 2 ] = J^Tclk] p{SUT) 2 . 
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In words, the above fact says that all the Fourier weight on sets of the form S U {any subset of restricted 
variables} "collapses" down onto S in expectation. Consequently, the influence of an unrestricted variable 
does not change in expectation under random restrictions: 

Lemma |2l Let p : { — 1, l} n — > M. Consider a random assignment p to the variables xi, . . . , Xk and fix 
l£[k + l,n}. Then B p [lnf e ( Pp )} = lni e (p). 

Proof. We have 

E p [lnf £ (p p )} = B p [ £ P P (S) 2 ]= E E P(SUT) 2 

£eSC[fc+l,ra] TC[k]eeSC[k+l,n] 
= E P(Uf =Inf/(p). 

l£UC[n] 

□ 



B.3 Proof of Claims 

Recall Claim |U 

Claim |9j Let p : { — l,l} n — > K be a degree-d polynomial. Let p be a random restriction fixing [j]. 
Fix any t > e 2d and any I £ [j + l,n]. With probability at least 1 — exp(— Qit 1 ^)) over p, we have 
lnf e (p p )<3 d tlnf e (p). ' 

Proof. The identity lnfe(p p ) = E^eSCh'+i n] Ppi^) 2 and Fact[l4"limply that Inf^(p p ) is a degree-2<i polyno- 
mial in p. Hence the claim follows from the concentration bound, Theorem|6l assuming we can appropriately 
upper bound the I2 norm of the polynomial Inf^(p p ). So to prove Claim|9]it suffices to show that 

||Inf<(Pp)||2 < 3 d Inf,(p). (8) 

The proof of Equation © is similar to the argument establishing that \\QW2 < 3 d de~ a in Section |2T1 
The triangle inequality tells us that we may bound the /2-norm of each squared-coefficient separately: 

l|Inf^ P )|| 2 < E \\P P (S) 2 h- 

ieSC\j+l,n] 

Since p p (S) is a degree-d polynomial, Theorem [5] yields that 

\\US)% = \\P P (S)\\1 < 3 d \\p P (s)\\l 

hence 

||Inf^ p )|| 2 < 3 d E \\Pp(S)\\ 2 2 = 3 d lnf e (p), 

eescy+i,n] 

where the last equality is a consequence of Fact [14] Thus Equation ([8]) holds, and Claim [9] is proved. □ 
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B.4 Proof of Lemma QJ 



Recall Lemma [TT] 

LemmaHTl Letp : {— 1, l} n — > W be a degree-d polynomial and <t,/3< 1/2. Fixa = G(dlog log(l//3)+ 
(ilogd) t' = t ■ (C'dlndln(l/T)) d , where C is a universal constant. (We assume wd.o.g. that the 
variables are ordered s.t. Infj(p) > Infj + i(p), i 6 [n — 1].) One of the following statements holds true: 

1. The polynomial p is t -regular. 

2. With probability at least l/(2C d ) over a random restriction p fixing the first a/r (most influential) 
variables ofp, the function sign(pp) is /3-close to a constant function. 

3. There exists a value k < a/r, such that with probability at least l/(2C d ) over a random restriction 
p fixing the first k (most influential) variables ofp, the polynomial p p is r '-regular. 

Proof. The proof is by a case analysis based on the value I of the r-critical index of the polynomial p. If 
I = 0, then by definition p is r-regular, hence the first statement of the lemma holds. If I > a/r, then we 
randomly restrict the first a/f many variables. Lemma [3] says that for a random restriction p fixing these 
variables, with probability at least \/(2C d ) the (restricted) degree-d PTF sign(p p ) is /3-close to a constant. 
Hence, in this case, the second statement holds. To handle the case I G [l,a/r], we apply Lemma [6] 
This lemma says that with probability at least l/(2C d ) over a random restriction p fixing variables [£], the 
polynomial p p is r'-regular, so the third statement of LemmaPTTIholds. □ 

C Comparison with f|MZ09fl 

We comment on the relation of our main result, Theorem [T] with a very similar decomposition result of 
Meka and Zuckerman [MZ09]. They obtain a small-depth decision tree such that most leaves are e-close to 
being e-regular under a stronger definition of regularity, which we will call "e-regularity in 1%" to distinguish 
it from our notion. 

Let p : {—1, l} n — > M be a polynomial and e > 0. We say that the polynomial p is "e-regular in fa" if 



n n 

^Inf^^e.^Inf^p). 

i=l i=i 

Recall that in our definition of regularity, instead of upper bounding the /2-norm of the influence vector 
/ = (Infi(p), . . . , Inf n (p)) by e times the total influence of p (i.e. the l\ norm of I), we upper bound the 
norm (i.e. the maximum influence). We may thus call our notion "e-regularity in 1^". 

Note that if a polynomial is e-regular in fa, then it is also e-regular in l^. (And this implication is 
easily seen to be essentially tight, e.g. if we have many variables with tiny influence and one variable with 
an e-fraction of the total influence.) For the other direction, if a polynomial is e-regular in l^, then it is 
-y/e-regular in fa. (This is also tight if we have 1/e many variables with influence e.) 

Meka and Zuckerman prove the following statement: 

Every degree-ci PTF / = sign(p) can be expressed as a decision tree of depth 2 ( d )-(l/e 2 ) log 2 (1/e) 
with variables at the internal nodes and a degree-d PTF f p = sign(p p ) at each leaf p, such that 
with probability 1 — e, a random root-to-leaf path reaches a leaf p such that f p is e-close to being 
e-regular in fa. (In particular, for a "good" leaf p, either p p will be e-regular in fa or f p will be 
e-close to a constant). 



\ 
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Theorem Q] shows exactly the same statement as the one above if we replace "Z2" by "Zoo" an d the depth of 
the tree by (1/e) • (d log(l/e))°( d ). 

Since e-regularity in I2 implies e-regularity in l^, the result of [MZ09 ] implies a version of Theorem Q] 
which has depth of 2°W • (1 /e 2 ) log 2 (l/e). Hence the [MZ09] result and our result are quantitatively incom- 
parable to each other. Roughly, if d is a constant (independent of e), then our Theorem Q] is asymptotically 
better when e becomes small. This range of parameters is quite natural in the context of pseudo-random gen- 
erators. In particular, in the recent proof that poly(l / e)-wise independence e-fools degree-2 PTFs [DKN09 ], 
using [MZ09] instead of Theorem [TJ would give a worse bound on the degree of independence (namely, 
0(e~ 18 ) as opposed to 0(e" 9 )). On the other hand, if d = f2(log(l/e)), then the result of BMZ091I is better. 



D Omitted Proofs from Section |3] 
D.l Proof of Claim H 

Claim H3l Let p : { — 1, l} n -S-Rtefl r-regular degree-d polynomial with Var[j>] = 1. Then Y > r x \\pix)\ < 
t] < Oidr- m ). 

Proof. We recall that, since Var[p] = 1 and p is of degree d, it holds Inf (p) < d. Thus, since p is r-regular, 
we have that max ie [ n ] Infj(p) < dr. An application of the invariance principle (Theorem [9]) in tandem with 
anti-concentration in gaussian space (Theorem [8]) yields 

Pr x [\p(x)\ < t] < 0(d ■ (dr) 1 ^) +Prg^n[\p(g)\ < r] 
< 0{dr 1/8d ) + 0{dT 1 ' d ) = 0(dT^ 8d ), 

and the claim follows. □ 



D.2 Proof of Lower Bound Theorems |3] and 3] 

Theorems [3] and @] are both consequences of the following theorem: 

Theorem 10. There exists a set C = {f\, . . . , /m} of M = 2 Qd ^ nd ^ degree-d PTFs f, L such that for any 
1 < % < j < M, we have dist(/ i5 fj) > C~ d . 

Proof of Theorems |3] and |4] assuming Theorem [TOj First we prove Theorem [3] We begin by claiming that 

(\ A 
^{<K(d)) ) man y integer-weight PTFs of degree K(d) and weight at most A. This is 

because any such PTF can be obtained by making a sequence of A steps, where at each step either —1, 0, or 
1 is added to one of the (<w d )) many monomials of degree at most K(d). Each step can be carried out in 
3{<K(d)) wa Y s > gi vm g tne claimed bound. 

By Theorem [lOl there are M distinct degree-ci PTFs fi , . . . , fu any two of which are C~ d -far from 
each other. Consequently any Boolean function (in particular, any weight-^ degree-iv"(d) PTF g) can have 



dist(g, fi) < C /2 for at most one fi. Since there are only [^{<K(d)) ) niany weight- A degree-K(d) 

(\ A 
3{<K(d)) ) ^ S l ess tnan M for some A = il^(n d / log n), it follows that some fi must have 

distance at least C~ d /2 from every weight-A, degree-K(d) PTF. This gives Theorem[3] 

The proof of Theorem|4]is nearly identical. We now use the fact that there are at most (3 • 2 n ) A many 
integer-weight PTFs of weight at most A (and any degree), and use the fact that (3 • 2 n ) A is less than M for 
some A = O rf (n d_1 ). □ 
It remains to prove Theorem [lOl 
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D.2.1 Proof of Theorem Q$| 

The proof is by the probabilistic method. We define the following distribution V over n-variable degree-d 
polynomials. A draw of p(x) = J2sc[n] \s\=dP(S)xs( x ) from T> is obtained in the following way: each of 
the Q) coefficients p(S) is independently and uniformly selected from { — 1, 1}. 

We will prove Theorem [10] using Lemma [l~5l which says that it is extremely likely the polynomial c - 
the product of two independent draws a and b from V - will have both small bias and large variance. 

Lemma 15. Let a(x) and b(x) be two degree-d polynomials drawn independently from T>, and let c(x) = 
a(x)b(x). Then with probability at least 1 — 2 _ ^ d ( nd ) we have: 

1. |c(0)| < \C- d ( n/ d 2 ), and 

2. Var[c]^E| 5 |>o^) 2 >^(1 2 ) 2 - 

Suppose that Lemma[T5lholds . Let a(x) and b(x) be independent draws from V and let c(x) = a(x)b(x) 
which satisfies the conclusions of the lemma. Then the constant term c(0) is small compared with the 
variance of c(x). Let us rescale so the variance is 1; i.e. define the polynomial 

, , def c(x) 

6[X) ~ Var[c]V2 

so Var[e] = 1 and |e(0)| < C~ d . We now apply Theorem[7]to the degree-2<i polynomial q(x) = —e(x) + 
e(0), and we see that with probability at least C~ d (over a random uniform draw of x) we have — e[x) + 
e(0) > C~ d , and hence Pr 2 ,[sign(e(x)) < 0] > C' d . 

We now observe that sign(e(x)) < if and only if sign(a(x)) ^ sign(6(x)), and consequently 
Pr a ,[sign(e(x)) < 0] is precisely dist(sign(a), sign(&)). We thus have that for a(x),b{x) drawn from 
V as described above, the probability that dist(sign(a), sign(6)) is less than C~ d is at most 2 _Qyn for 
some absolute constant ad > (depending only on d). 

Now let us consider M = 2^ ad l 2 ^ n many independent draws of polynomials ai, a%, . . . , clm from V. 
A union bound over all the < 2 adn pairs with 1 < i < j < M gives that with nonzero 
probability, every ai,aj pair satisfies dist(sign(aj), sign(aj)) > C~ d . Thus there must be some outcome 
for the polynomials ai,a2, . . . , a« such that dist(sign(dj), sign(aj)) > C~ d for all 1 < i < j < M. 
Setting fi = sign(aj) for this outcome, Theorem [TOl is proved. □ 

It remains only for us to prove Lemma [131 
D.2.2 Proof of Lemma [H 

Let us consider a(x) = 2~2\s\=d ®(S)xs(x) and b{x) = 2~2\s\=db(S)xs(x) drawn independently from T>. 
We will show that the bias of the polynomial c(x) = a(x)b(x) fails to satisfy the bound in item 1 with 
probability 2'^^) . Then we show the variance of c fails to satisfy item 2 with probability 2 ^( nd ), and 
the lemma follows from a union bound. 

To bound the bias of c, we begin by noting that: 

c(0) = £ a{S)b(S). 

SC[n] 

Each term a(S)b(S) in the summand is uniform, i.i.d in { — 1, 1}. Define the random variable Xs = 1/2 — 
(l/2)a(S)b(S). Then J2sc[n] x s is binomially distributed and setting t = \C- d {^) d , we may apply the 
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Chernoff bound to obtain: 

Pr[c(0) < ~\c- d (^) d ] = Pr[X > E[X] + 1] < exp (-2^) = 2~^ 

The same analysis gives a bound on the magnitude in the negative direction. Since (™^ 2 ) > (jj) d , this 
concludes the analysis for the first item of the lemma. 

Now we show that item 2 of the lemma also fails with very small probability. The following terminology 
will be useful. Let T C [n] be a subset of size exactly \T\ = 2d (we think of T as the set of variables defining 

def del 

some monomial of degree 2d). For such a T we let first(T) = Tn [n/2] and second(T) = Tn [n/2 + 1, n}. 

We say that such a T is balanced if | first (T)| = | second (T)| = d. Note that there are exactly (™^ 2 ) many 
balanced subsets T. 

We say that a subset U C [re], \U\ = d is pure if U is contained entirely in [re/2 + 1, re]. 

Let us consider a(x) = J2\s\=d®(S)xs( x ) an d b(x) = Yl\s\=d,b(S)xs( x ) drawn independently from 
V. Fix any outcome for a (i.e. for the values of all coefficients a(S)), and fix any outcome for b(U) 
for every U which is not pure. Thus the only "remaining randomness" is the value (drawn uniformly from 
{-1, 1}}) for each of the ( n/ d 2 ) coefficients b(U) for pure U. We will show that with probability at least 

1 — 2~ Qd ( nd ^ over the remaining randomness, at least | ( n ^ 2 ) of the ( n ^ 2 ) many balanced subsets T have 
c(T) / 0. Since each value c(T) which is nonzero is at least 1 in magnitude, this suffices to prove the 
lemma. 

Consider any fixed pure subset U C [n], \U\ = d (for example U = {n — d + 1, . . . , n}). Let T be 
a balanced subset of n (so \T\ = 2d) such that second(T) equals U. (There are precisely ( n ^ 2 ) balanced 
subsets T with this property; let Tu denote the collection of all ( n ^ 2 ) of them.) Consider the value c(T): 
this is 

c(T)= Y, a(S)b(T-S). 

SCT,\S\=d 

The only "not-yet-fixed" part of the above expression is the single coefficient b{U); everything else has been 
fixed. Since the coefficient a(T — U) of b{U) is a nonzero integer, there are two possible outcomes for 
the value of c(T), depending on whether b(U) is set to +1 or -1. These two possible values differ by 2; 
consequently, there is at most one possible outcome of b(U) that will cause c(T) to be zero. (Note that it 
may well be the case that no outcome for b(U) would cause c(T) to become zero.) 

Let us say that an outcome of b(U) is pernicious if it has the following property: at most | ( n ^ 2 ) of the 
( n ^ 2 ) elements T 6 Tu have c(T) take a nonzero value under that outcome of b(U). (Equivalently, at least 
| ( n ^ 2 ) of the ( n ^ 2 ) elements T G Tu have c(T) become zero under that outcome of b(U).) It may be the 
case that neither outcome in { — 1, 1} for b(U) is pernicious (e.g. if each outcome makes at least 95% of the 
c(T) values come out nonzero). It cannot be the case that both outcomes { — 1, 1} for b(U) are pernicious 
(for if there were two pernicious outcomes, this would mean that at least | of the c(T) values evaluate to 
under both outcomes for b(U), but it is impossible for any c(T) to evaluate to under two outcomes for 
b(U)). Consequently we have 

Pr[the outcome of b(U) is pernicious] < 1/2. 
This is true independently for each of the ( n ^ 2 ) many pure subsets U. As a result, a simple analysis gives 

Pr[at least 3/4 of the [^j pure subsets U have a pernicious outcome] < 2~^ d<nd ). 
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Thus we may assume that fewer than 3/4 of the ( n / 2 ) pure subsets U have a pernicious outcome. So at 
least \ ( n ^ 2 ) of the pure subsets U are non-pernicious. For each such non-pernicious U, more than | ( n ^ 2 ) 

of the ( n ^ 2 ) elements in Tu have c(T) take a nonzero value. Consequently, at least ^ ( n ^ 2 ) many balanced 
subsets T overall have c(T) / 0. This proves the lemma. □ 
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